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This paper provides a survey of modern nonlinear filtering methods for attitude estima- 
tion. Early applications relied mostly on the extended Kalman filter for attitude estimation. 
Since these applications, several new approaches have been developed that have proven to 
be superior to the extended Kalman filter. Several of these approaches maintain the basic 
structure of the extended Kalman filter, but employ various modifications in order to pro- 
vide better convergence or improve other performance characteristics. Examples of such 
approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman 
filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters 
that propagate and update a discrete set of sigma points rather than using linearized equa- 
tions for the mean and covariance are also reviewed. A two-step approach is discussed 
with a first-step state that linearizes the measurement model and an iterative second step 
to recover the desired attitude states. These approaches are all based on the Gaussian 
assumption that the probability density function is adequately specified by its mean and 
covariance. Other approaches that do not require this assumption are reviewed, including 
particle filters and a Bayesian filter based on a non- Gaussian, finite-parameter probability 
density function on SO (3). Finally, the predictive filter, nonlinear observers and adap- 
tive approaches are shown. The strengths and weaknesses of the various approaches are 
discussed. 


I. Introduction 

The extended Kalman filter 1-7 (EKF) is the workhorse of real-time spacecraft attitude estimation. Since 
the group SO (3) of rotation matrices has dimension three, most attitude determination EKFs use lower- 
dimensional attitude parameterizations than the nine- parameter attitude matrix itself. The fact that all 
three-parameter representations of SO (3) are singular or discontinuous for certain attitudes 8 has led to ex- 
tended discussions of constraints and attitude representations in EKFs. 6,7,9-11 These issues are now well 
understood, however, and the EKF, especially in the form known as the multiplicative extended Kalman 
filter 5-7 (MEKF), has performed admirably in the vast majority of attitude determination applications. Nev- 
ertheless, poor performance or even divergence arising from the linearization implicit in the EKF has led to 
the development of other filters. Several of these approaches retain the basic structure of the EKF, such as 
additive EKF approaches, 12-17 a backwards-smoothing EKF, 18 and deterministic EKF-like estimators. 19-21 
In particular, the backwards-smoothing EKF solves a nonlinear smoothing problem for the current and past 
sample intervals using iterative numerical techniques. The numerical iteration retains all of the nonlinear- 
ities of a fixed number of stages that precede the terminal stage of interest, and it processes information 
from earlier stages in an approximate manner. 18 Deterministic EKF-like estimators are closely related to 
Hoo control design. 22 Usually an upper bound is derived first, and then the bound is minimized based on 
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approaches such as Riccati equations and linear matrix inequalities. The nonlinear problem is much more 
difficult, which requires the solution of the Hamilton- Jacobi-Isaacs partial differential inequality. For attitude 
estimation, small-error approximations are used to developed a filter. 19,20 

Other designs use various assumptions to derive simplified filters. These generally provide suboptimal 
performance characteristics in relation to the EKF, but involve linear or pseudo-linear equations that are 
used to estimate the states of a nonlinear dynamical system. Therefore, linear design and analysis tools 
can be used to construct the filter and assess its overall performance. Some of these use a deterministic 
solution of the attitude, e.g. methods that are based on the QUEST attitude determination solution. 23 
Simple filter designs based on QUEST include filter QUEST 24 and recursive QUEST. 25 A more complicated 
but far more robust approach, called extended QUEST, 26 uses a full nonlinear propagation along with a 
novel measurement update. This approach can be used to estimate attitude and additional parameters as 
well. Interlaced filters that replace a nonlinear filter with two or more linear filters have been used for rate 
estimation, but not for attitude estimation. 27-29 

Several new alternatives to the standard EKF have been recently introduced, such as sigma point or 
unscented filters 30-33 and particle filters. 34-37 Unscented filters (UFs) have been shown to exhibit several 
advantages over the EKF, including: 1) the expected error is lower than the EKF, 2) they can be applied 
to non-differentiable functions, 3) no Jacobian matrix calculations are required, and 4) they provide higher- 
order expansions than the standard EKF. Unscented filters work on the premise that with a fixed number 
of parameters it should be easier to approximate a Gaussian distribution than to approximate an arbitrary 
nonlinear function. They typically use the standard Kalman form in the post-update, but use a different 
propagation of the covariance and pre-measurement update with no local iterations. The attitude estima- 
tion UF derived in Ref. 33 is based on a quaternion representation of the attitude kinematics. However, 
straightforward implementation of the standard UF equations derives a predicted quaternion mean from an 
averaged sum of quaternions. Therefore, no guarantees can be made that the resulting quaternion will have 
unit norm. This was overcome by using a generalized unconstrained three-component vector to represent 
the attitude error-quaternion, leading to an unconstrained formulation in the UF design. 

Unscented filters are essentially based on second or higher order approximations of nonlinear functions, 
which are used to estimate the mean and covariance of the state vector. Though the mean and covariance are 
the sufficient statistics of a Gaussian distribution, they are not sufficient to represent a general probability 
distribution. When UF methods are applied to strongly nonlinear and non-Gaussian estimation problems, 
where the a posteriori distribution of the state vector may be multi-peaked, heavily-t ailed, or skewed, desired 
performance characteristics may not be obtained. This may be overcome by using particle filters 34-37 (PFs). 
Like other approximate approaches to optimal filtering, the ultimate objective of PFs is to construct the a 
posteriori probability density function (PDF) of the state vector, or the PDF of the state vector conditional 
on all the available measurements. However, the approximation of PFs is vastly different from that of con- 
ventional nonlinear filters. The central idea of a PF approximation is to represent a continuous distribution 
of interest by a finite (but large) number of weighted random samples of the state vector, or particles. A 
PF does not assume the a posteriori distribution of the state vector to be a Gaussian distribution or any 
other distribution of known form. In principle, it can estimate probability distributions of arbitrary form 
and solve any nonlinear and/or non-Gaussian system. Reference 36 presents a PF for attitude estimation 
based on the bootstrap filter. 35 

The optimal solution of the nonlinear estimation problem requires the propagation of the conditional PDF 
of the state given the observation history. 37 All practical nonlinear filters are approximations to this ideal. 
Exact finite dimensional filters 38 can be found that solve some nonlinear problems by using the Fokker-Planck 
equation 2,39 to propagate a non-Gaussian PDF between measurements and Bayes’ formula 1 ’ 2 to incorporate 
measurement information. A recently proposed filter 40 follows this pattern, but does not solve the nonlinear 
attitude filtering problem exactly. We refer to it as an orthogonal filter, because it represents the attitude 
by an orthogonal rotation matrix, rather than by some parameterization of the rotation matrix. The PDF 
is a non-Gaussian function defined on the Cartesian product of SO(3), the group of rotation matrices, 
and the Euclidean space R N of bias parameters. This filter entirely avoids questions about singularities of 
representations or covariance matrices arising in EKFs 6-10 and UFs, 33 and has the additional advantage of 
providing a consistent initialization for a completely unknown initial attitude, owing to the fact that SO(3) 
is a compact space. 

Nonlinear observers often exhibit global convergence, which is to say that they can converge from any 
initial guess. 41 Several applications of observers for attitude control have been proposed Refs. 42-48. A 
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nonlinear observer and controller using only measurements of roll, pitch and yaw has been developed in 
Ref. 43. Local asymptotic stability is ensured under mild hypotheses. A globally convergent, nonlinear full- 
order observer using quaternions and Euler’s equations for the dynamics has been derived in Ref. 44. The 
observer structure contains a discontinuous term, which is often associated with “sliding mode” observers. 
The error-quaternion is defined using a multiplicative approach and stability is proven using a Lyapunov 
function. A simpler, robust smoothed sliding mode observer that avoids quaternion error differentiation noise 
and eliminates the necessity of measuring angular rate is derived in Ref. 45. Although an additive approach is 
used to define the quaternion error, global stability is still provided through a Lyapunov function. Algrain and 
Lee develop a nonlinear observer to estimate angular rates along the third axis of a spinning spacecraft using 
only two-axis measurements. 46 A pseudo-linear model is developed by decomposing the nonlinear system 
into linear and nonlinear parts. Boskovic, Li and Mehra use angular rate measurements with quaternion 
kinematics to derive a nonlinear bias observer, which is coupled with an adaptive sliding mode controller. 47 
Stability is proven as long as the attitude never passes through ±180 rotations. Thienel and Sanner develop 
an exponentially convergent nonlinear observer given a constant gyro bias with identification of the bias 
proven through a persistency of excitation argument. 48 An analysis is also shown that includes gyro noise. 

Adaptive approaches generally fall into two categories. One category encompasses approaches that adap- 
tively tune the Kalman filter through the identification of either the process noise covariance or measurement 
noise covariance, or both simultaneously. In practice “tuning” a Kalman filter can be arduous and very time- 
consuming. Usually, the measurement-error covariance is fairly well known, derived from statistical inferences 
of the hardware sensing device. However, the process noise covariance is usually not well known and is of- 
ten derived from experiences gained by the design engineer based on intimate knowledge of the particular 
system. The approach is based on “residual whitening.” 49 Unfortunately, most noise adaptive techniques 
are applicable only for linear systems, 1 which creates problems for attitude estimation due to the nonlinear 
equations involved. Still, it is possible to use these techniques with linearized equations, as demonstrated in 
Ref. 50. Lam and Wu further develop adaptive filters that address both colored and white noise statistics. 51 
The former noise is identified using a non-parametric neural network approach, while the latter noise is 
identified using an a- (3 filter. An adaptive filter is also proposed in Ref. 17 to account for inaccuracy in 
the knowledge of the process noise statistical model, which uses a linear pseudo-measurement model. Other 
adaptive approaches use adaptive methods for fault tolerant estimation purposes. 52,53 The other category 
includes approaches that adaptively estimate unknown system parameters, such as the inertia matrix. These 
generally fall into two basic categories: 1) parameter estimation or filter-based methods, and 2) nonlinear 
adaptive techniques. Least squares methods to determine the inertia matrix and other constant parameters, 
such as disturbance model parameters and biases, are shown in Refs. 54-56. A disturbance accommodation 
technique that models the unknown disturbance angular rate using a power set of . time as basis functions 
is shown in Ref. 57. Nonlinear adaptive techniques are similar to nonlinear observers in that they usually 
provide global stability proofs that guarantee convergence of the estimated parameters. 58 ' 60 

This paper will review the basic assumptions of these filters, presenting enough mathematical detail 
to give a general orientation. First, reviews of the quaternion parameterization and gyro model equations 
are given. Then, attitude estimation methods based on the EKF are shown, followed by QUEST-based 
approaches. Next, the two-step estimator is shown. The UF and PF approaches are then shown, followed 
by the orthogonal filter. Then, the predictive filter, as well as nonlinear observers and adaptive approaches 
are reviewed. The paper concludes with a discussion of the strengths and weaknesses of the various filters. 

II. The Quaternion Parameterization and Gyro Model 

The attitude of a vehicle is defined as its orientation with respect to some reference frame. If the 
reference frame is non-moving, then it is commonly referred to as an inertial frame. To describe the attitude 
two coordinate systems are usually defined: one on the vehicle body and one on the reference frame. For 
most dynamical applications these coordinate systems have orthogonal unit vectors that follow the right- 
hand rule. The attitude matrix (A), often referred to as the direction cosine matrix or rotation matrix, maps 
one frame to another. The attitude matrix is an orthogonal matrix, i.e. its inverse is given by its transpose, 
and proper, i.e. its determinant is ±1. 61 For spacecraft applications the attitude mapping is usually applied 
from the reference frame to the vehicle body frame. Mathematically, the mapping from the reference frame 
to the body frame is given by 

b = Ar (1) 
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where b is the body-frame vector and r is the reference- frame vector. 

Several parameterizations of the attitude are possible . 62 Minimal parameterizations, such as the Euler 
angles, the Rodrigues parameters (Gibb’s vector), and the modified Rodrigues parameters (MRPs), are often 
avoided in filter designs for the “global attitude” due to their associated singularities . 8 They are often used 
to define the “local error attitude” though, as is discussed later. For modern-day applications, i.e. since the 
early 1980s, the quaternion 63 has been the most widely used attitude parameterization. The quaternion is 
a four-dimensional vector, defined as 


q = 


with 


Q = [qi <72 < 73 ] r = esin(i?/2) 
<74 = cos(#/ 2 ) 


( 2 ) 


(3a) 

(3b) 


where e is the unit Euler axis and d is the rotation angle. Since a four-dimensional vector is used to describe 
three dimensions, the quaternion components cannot be independent of each other. The quaternion satisfies 
a single constraint given by q T q = 1 . The attitude matrix is related to the quaternion by 


- 4 (q) = (<74 - Hell 2 ) ^ 3 X 3 +2QQ T -2q 4 [gx} = H r (q)^(q) 
where I 3x3 is a 3 x 3 identity matrix and 


2(q) = 
*(q)s 

Also, [^x] is the cross-product matrix defined by 

[<?x] = 


<74/3x3 + [ex] 

T 

-Q 

<?4^3x3 ” [ffX] 


0 <72 

q 3 0 -<71 

~q2 <7i 0 


( 4 ) 

(5a) 

(5b) 


(6) 


For small angles the vector part of the quaternion is approximately equal to half angles , 62 which will be used 
later. 

The quaternion kinematics equation is given by 


q= = -^Mq 


where u; is the three-component angular rate vector and 


fl(u;) 


ux 


-or 


OJ 


A useful identity is given by 
where 


= T(a;)q 


FM = 


wx 






0 


( 7 ) 


(8) 


( 9 ) 


(10) 


A major advantage of using the quaternion is that the kinematics equation is linear in the quaternion 
and is also free of singularities. Another advantage of the quaternion is that successive rotations can be 
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accomplished using quaternion multiplication. Here we adopt the convention of Ref. 6, who multiply the 
quaternions in the same order as the attitude matrix multiplication, in contrast to the usual convention 
established by Hamiliton. 63 Suppose we wish to perform a successive rotation. This can be written using 

^(q')^(q) = A( q' <g> q) (11) 

The composition of the quaternions is bilinear, with 


>q = 


'J'(q') q' 


q = 


3(q) q 


(12) 


Also, the inverse quaternion is defined by 



(13) 


Note that q<g> q -1 = [0 0 0 1] T , which is the identity quaternion. A computationally efficient algorithm to 
extract the quaternion from the attitude matrix is given in Ref. 64. A more thorough review of the quaternion 
parameterization, as well as other parameterizations, can be found in the survey paper by Shuster 62 and in 
the book by Kuipers. 65 

A common sensor that measures the angular rate is a rate-integrating gyro. For this sensor, a widely 
used three-axis continuous-time model is given by 66 


d> = w + (3 + rj v 
P = Vu 


(14a) 

(14b) 


where u> is the measured rate, (3 is the drift, and r] v and 7] u are independent zero-mean Gaussian white-noise 
processes with 


E {Vv(t)vZ ( t)} =al5(t-T)I 3X 3 (15a) 

E {Vu{t)vl{T)} = - t)I 3X 3 (15b) 


where £{•} denotes expectation and S(t — r) is the Dirac delta function. A more general gyro model includes 
scale factors and misalignments, which can also be estimated in real time. 67,68 For simulation purposes, 
discrete-time gyro measurements can be generated using the following equations: 69 


d>A:+l = 1 + 2^ +1 + ^ + 


r f£ , J_ _2 At 

At + 12 uAt 


1 1/2 




Pk+i =Pk + °u At 1/2 N U 


(16a) 

(16b) 


where the subscript k denotes the k th time-step, At is the gyro sampling interval, and and N u are 
zero-mean Gaussian white-noise processes with covariance each given by the identity matrix. 


III. Extended Kalman Filter 

The most straightforward way to attack a nonlinear estimation problem is to linearize about the current 
best estimate. This leads, of course, to the EKF, 3 which is the workhorse of satellite attitude determination. 
There are several different implementations of the attitude EKF, depending on both the attitude repre- 
sentation 62 used in the state vector and the form in which observations are input. It is a well-known fact 
that all globally continuous and nonsingular representations of the rotations have at least one redundant 
component, 8 so we are faced with the alternatives of using an attitude representation that is either singular 
or redundant. The various strategies to face or evade this dilemma can be divided into three general classes, 
which we will refer to as the minimal representation EKF, the multiplicative EKF (MEKF) and the additive 
EKF (AEKF). Reference 6 presents an overview of Kalman filtering for spacecraft attitude estimation, em- 
phasizing the quaternion representation, with a complete list of references through 1981. This section will 
provide a brief overview, emphasizing developments since 1981. 
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A. Observation Preprocessing in an EKF 

Almost all attitude measurements can be converted to unit vectors: centroids in a star tracker’s focal plane 
to star unit vectors, horizon sensor measurements to a nadir- pointing unit vector, or triaxial magnetometer 
measurements to a unit vector along the magnetic field, for example. It has proved convenient, therefore, 
to develop a standard unit vector interface for the EKF, resulting in the unit vector filter (UVF). 70,71 The 
UVF employs also a very useful approximation to the errors in the unit vector measurements. In particular, 
Shuster and Oh 23 have shown that nearly all the probability of the errors is concentrated on a very small 
area about the direction of At, so the sphere containing that point can be approximated by a tangent plane, 
characterized by 

b = At -f v, v t At = 0 (17) 

where b denotes the measurement and the sensor error v is approximately Gaussian, which satisfies 

E{v} = 0 (18a) 

R=E {vv T } = cr 2 [/ 3 x 3 - (^r)(^lr) T ] (18b) 

Equation (18b) is known as the QUEST measurement model. 23,70,71 This model is quite accurate for small 
field-of-view sensors. The approximations in this error model are discussed in Ref. 72 and 73. Equation 
(18b) gives a rank- deficient R matrix, which would appear to give rise to problems for the EKF, but Shuster 
has shown that the simpler, full-rank form 

R = CT 2 I 3 X3 (19) 

gives equivalent results. 70 The QUEST measurement model has been expanded for large field-of-views in 
Ref. 73. 

Using QUEST 23 or an equivalent quaternion estimator as a data compressor simplifies the interface even 
further. This is especially useful since many modern star trackers compute a quaternion from multiple star 
vectors, and the quaternion output from the star tracker provides a convenient “measurement” for input to 
an EKF. 7,70,74 

B. Minimal Representation EKF 

The rotation group has three dimensions, so the most straightforward implementation of an EKF employs 
a three-dimensional parameterization of the attitude. The earliest published attitude EKF used the 1-2- 
3 sequence of Euler angles. 4 It is well known that these angles have a “gimbal lock” singularity when the 
magnitude of the middle angle is 90 degrees, so this form of the EKF is most appropriate when the spacecraft 
does not stray too far from a reference attitude. A good example is an Earth-pointing spacecraft, with the 
attitude being defined with respect to a local- vertical /local- horizontal coordinate frame. If the gimbal lock 
condition arises, the coordinate axes to which the vehicle attitude is referenced must be repeatedly shifted 
to avoid singularity. 4 

Euler angles are inappropriate for agile spacecraft, such as astronomical observatories. Minimal repre- 
sentation EKFs employing the Rodrigues parameters 75 and the MRPs have been developed for this applica- 
tion. 76 The MRPs are nonsingular for rotations less than 360 degrees, and the singularity can be avoided by 
changing to a “shadow set” of parameters. 77 This form of the EKF has not found wide application, however. 

C. Multiplicative EKF 

The MEKF represents the attitude as the product of an estimated attitude and a deviation from that 
estimate. A nonsingular representation of the estimated attitude and a three-parameter representation of 
the deviation are employed. The most usual implementation uses the quaternion representation for the 
attitude. 5-7, 78-80 In this case the product is 


q = <5q(0)®q (20) 

where q is the unit estimated quaternion and <5q(0) is a unit quaternion representing the rotation from q 
to the true attitude q, parameterized by a three-component vector 0. 

An alternative formulation, which has some advantages, reverses the order of multiplication in Eq. (20) 
so that 0 represents the attitude errors in the inertial reference frame rather than in the body frame. 81-83 
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It is also possible to represent the reference attitude by an estimated attitude matrix A rather than by an 
estimated quaternion. 7,81,84,85 This requires more parameters, but may save computations if the attitude 
matrix is explicitly required. An argument in favor of the quaternion is that it is easy to restore normalization 
that may be lost due to numerical errors, while restoring the orthogonality of A is nontrivial. Gray has argued 
that this argument is not compelling if reasonable computational care is taken. 81 

The representation of Eq. (20) is clearly redundant. The basic idea of the MEKF is that the EKF estimates 
the three- vector 0 while the correctly normalized four-component q provides a globally nonsingular attitude 
representation. If 0 = E{<fi} is the estimate of 0, then Eq. (20) says that 5q(0) ® q is the estimate of the 
true attitude quaternion q. This is equal to q if we remove the redundancy in the attitude representation by 
ensuring that 0 has zero mean so that dq(0) = <5q(0) is the identity quaternion. This choice means that 0 is 
a three-component representation of the attitude error and its covariance is the attitude error covariance in 
the body frame. The fundamental advantages of the MEKF are that q is a unit quaternion by definition, the 
covariance matrix has the minimum dimensionality, and the three-vector 0 never approaches a singularity, 
since it represents only small attitude errors. 

Several choices for 0 have been used, 7 including the vector of infinitesimal rotation angles, 80 two times 
the vector part of the quaternion, 6 two times the vector of Rodrigues parameters, 7 * four times the vector of 
MRPs, or the integrated rate parameters (IRPs). 85 All these choices have the small-angle approximation 


<5q(0) = 


0/2 

1 


+ 0 ( 11011 ) 


( 21 ) 


and MEKFs employing them differ only in third order in the measurement updates to the error angle. 7 

The MEKF was first used in the Space Precise Attitude Reference System (SPARS) in 1969, 78,84 was 
later developed for NASA’s Multimission Modular Spacecraft, 5 and has been used for attitude estimation 
on board several NASA spacecraft. It has been discussed in detail in Refs. 6 and 7. The latter reference 
discusses the extension of the MEKF to a second-order filter, following earlier work by Vathsal. 86 


D. Additive EKF 

An AEKF uses a nonsingular parameterization of the attitude in the filter’s state vector. Almost all AEKFs 
have employed the quaternion, 12 " 17 but the attitude matrix itself has also been employed. 87 We will only 
discuss the quaternion EKF. 

The quaternion AEKF relaxes the quaternion normalization condition and treats the four components 
of the quaternion as independent parameters. It defines the estimate q and error Aq by 

q = E{ q | y} and Aq = q - q (22) 

This means that 

£{||q|| 2 I y} = E{ ||q + Aq|| 2 |y} = ||q|| 2 + £{||Aq|| 2 | y) > || q || 2 (23) 

where y denotes the measurement vector. The equality in Eq. (23) is valid only if Aq is identically zero. 
Equation (23) shows that if the random variable q has unit norm and is not error-free, the norm of its 
expectation must be less than unity. The usual expression for the attitude matrix as a homogenous quadratic 
function of the quaternion gives an orthogonal matrix only if the quaternion has unit norm. Many AEKFs 
have used the homogenous quadratic form, which means that the attitude matrix is only approximately 
orthogonal. It can be shown to approach orthogonality as the filter converges, however. 12-14 

The attitude matrix is guaranteed to be orthogonal if it is computed using the normalized quaternion 
q/||q|| in the homogenous quadratic form, giving 

A fi (q) = llqir 2 {(d - Hell 2 ) ^3x3 + 2 qq t - 2 g 4 [ex]} (24) 

The subscript R identifies this as the ray representation model, since any quaternion along a ray in Euclidean 
quaternion space (a straight line through the origin) represents the same attitude, with the exception of the 
zero quaternion at the origin. This is also known as the linearized orthogonalized matrix (LOM) model. 13, 14 
The ray representation form of the AEKF, which is effectively equivalent to the MEKF in the limit of contin- 
uous measurements, 72 has been applied to attitude estimation of the ALEXIS and CAPER spacecraft. 15, 16 
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Choukroun, Bar-Itzhack, and Oshman formulate a measurement model based on the matrix factorization 
of the attitude matrix 17 shown in Eq. (4). This gives a measurement model that is linear in the quaternion; 
but with state-dependent measurement noise, 

y = H(q)(b + v)-^(q)r= ^ + ^ n ^ < l + 5 ( t l) v ( 25 ) 

~(b-r) J 0 

where v is the measurement noise, which may or may not correspond to a unit-vector observation. This 
model gives the same covariance and state propagation as the ray representation AEKF, in the limit of 
continuous measurements, 72 but has subtle differences for discrete measurements. 

The relative merits of the AEKF and the MEKF have been discussed at length. 8 9 ’ 10,72,88 One disadvantage 
of the AEKF is that its covariance matrix includes elements expressing the variance of the quaternion norm 
uncertainty and the correlation of the norm uncertainty with all other estimated parameters. These terms, 
which are not present in the MEKF, are neither conceptually nor computationally desirable. 

E. Backwards-Smoothing EKF 

The A; th step in a nonlinear filtering problem can be posed as a maximum a posteriori probability (MAP) 
estimation problem by writing the PDF as p k = exp (—Jk) with the loss function 

1 k ~ l 

Jk = 2 E {[y^ 1 - h i+i( x i+i)f R i+1 [yi+1 - h i+ 1 (x i+ i)] + wfQ^Wi j 

2=0 (26) 

+ ^(x 0 - x 0 ) T P 0 _1 (x 0 - x 0 ) 

The MAP estimate Xk is the vector Xk that, along with x* and process noise w* for i = 0, 1, . . . , k — 1, 
minimizes Jk subject to the dynamics equation 

x m = fi(xi, Wi) for i = 0, 1, . . . , k - 1 (27) 

The process noise covariance is Qi , the measurement noise covariance is R % , yz+i is the measurement at 
time U+ 1 , hi+i(xi +1 ) is the nonlinear measurement model, and x 0 is the a priori estimate of the state with 
covariance Po- It can be seen that the size of this problem grows with k. The usual EKF avoids this growth 
by not explicitly recomputing the values of x* for i < k when x* is optimized in the & th step. The iterated 
EKF improves upon the EKF by iterating the nonlinear measurement update equation for x/c, re-linearizing 
about the updated state estimate at each iteration, 1,89 but it does not explicitly recompute the values of x* 
for i < k. Any Kalman filter implicitly recomputes the past state estimates at a new measurement update; 
but this point is often overlooked because estimates in the past are generally of no interest. For linear 
dynamics and measurements, these past estimates are optimal, but they are not optimal with nonlinear 
dynamics or measurements. Thus the EKF linearizations of the past measurements and dynamics are not 
about the optimal estimates. 

The backwards-smoothing EKF (BSEKF), or super-iterated EKF, 18 improves on the iterated EKF by 
relinearizing a finite number of measurements in the past when a new measurement is processed. The BSEKF 
therefore combines some of the properties of an EKF, a smoother, and a sliding-batch estimator. It finds x^ 
along with x* and w* for i = k — m(/c), . . . , k — 1 to minimize the loss function 

1 k—l 

Jk = r E {[yi+i - h i+i( x i+i)] T ^ t 7 +i [y«+i - h t+i(xi+i)] +wfQ t “ 1 w i | 

i=fc— m(k) (28) 

T ^ [ x fc-m(fc) “ m(fc)] [ x fc~m(fc) “ x /c—m(fc)] 

subject to the dynamics equation 

x<+i = fi(xi, Wi) for i = k - m(k), ... , k - 1 (29) 

The loss function of Eq. (28) retains all of the nonlinearities of the most recent ra(k) stages, but the nonlinear 
effects of all the previous stages are represented by the quadratic second term, which is an approximation 
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to the loss function Jfc-m(fc) for fixed x fc _ m ( fe ) optimized over all the x* and w* for i < k - m(fc). A value 
m tar g et for the number of stages to be retained is chosen to balance accuracy and computational effort. When 
k < retarget* the BSEKF uses m(k) — fc stages, and when k > m targ et, it uses m(k) — m target stages. 

The BSEKF solves nonlinear least-squares problem of Eqs. (28) and (29) using a guarded Gauss-Newton 
method, 90 which works with guesses and increments of x fc _ m(fc) , w fe _ m(fc) , w fc _ m(fc)+1 , w fe _ m(fe)+2 , . . 
and w fc _i. Basing the initial guesses x°_ m(fc) , w°_ m(fc)> w“ )+1 , w°_ m(Jfc)+2 , . . , and w°_, for the A; th 
measurement update on the converged values of the previous update minimizes the number of Gauss-Newton 
iterations required. The state estimates for the j th Gauss-Newton iteration are propagated by 

xf +1 = f<(x£, w{) for i = k~ m(k ), . . . , k - 1 (30) 

The loss function J 3 k for this iteration and the measurement residuals and partial derivative matrices 


Ayi+i = Yi+i - h^ +1 (xi + i), $i = dii/dxLi, Ti = dU/dvti, and Hi = dhi/dxi 


(31) 


are computed , and Ax Jt _ m ( ifc ), Aw fc . m(fc) , Aw fc _ m(fc ) +1 , Aw fe _ m(fc ) +2 , . . and Aw fc _i are found to minimize 
the loss function 

1 k - 1 

J k = 2 [( A y«+i “ H i+1 Ax i+ i) T R-^ (Ay i+ i - H i+1 Ax i+l ) + (wf + Aw i ) T Q~ 1 (w J i + Aw*) 


i=k—m(k) 

+ 2 + “ *k-m{k)\ T [Pk-m{k)] * [ X fc- m (fc) + ^ X k~m(k) ~ x k-m(k)} 


(32) 


subject to the dynamics equation 

Ax i+ i = $iAx* + r^Awi (33) 

This linear fixed-interval smoothing problem is solved using a square-root information matrix method that 
has a forward filtering pass followed by a backwards smoothing pass. 91 Now the guesses are updated by 

X i-m(fc) = X i-m(fc) + a Axfc_ m (fc) (34a) 

w i +1 = w i + ft Awi for i — k — m(fc), . . . , k — 1 (34b) 

x i+i = f i( x i +1 » w i +1 ) for i = k — m(k), . . . , A: — 1 (34c) 

for a = 1, and the updated loss function J^ +1 is calculated. If the loss function has decreased, the next 
Gauss-Newton iteration is computed. If the loss function has increased, the value of a is halved and the 
updates of Eq. (34) are recomputed. The Gauss-Newton iterations are repeated until either termination 
criteria are satisfied or a limiting number of iterations is exceeded. 

The quantities and must be computed to insert into Eq. (28) to execute the next time 

step. Contrary to appearances, these are not the a posteriori state estimate and its covariance at sample 
time tk-m(k)' They are computed from the linearized forward filtering pass during the last Gauss-Newton 
iteration of the smoothing problem that ends at sample k. The details of this computation are in Ref. 18. 

This algorithm was applied to the difficult problem of a tumbling micro-satellite in a circular earth orbit 
with altitude 823 km and inclination 82 deg. The satellite uses only magnetometer measurements sampled 
once every 10 seconds to estimate its attitude and angular rate, which has the initial value of 2 deg/sec. 
Eulerian dynamics were modelled with gravity-gradient torques and white-noise disturbance torques. The 
filter estimates the components of the spacecraft inertia tensor, which is not diagonal, along with the satellite 
attitude and rates. Since the overall scale of the inertia tensor is unobservable, the scale was fixed by assigning 
a very small a priori variance to one of the elements. 

Three cases were simulated. All had the same initial estimate of the inertia tensor elements with errors 
as large as 4.4% of the maximum principal inertia, which resulted in errors in the principal axis orientations 
of up to 44 deg. The first simulation had moderate initial errors of 0.34 deg in attitude and in 9.1% in 
angular rate. The second simulation tested the filter’s ability to converge from large initial errors by starting 
with a 180 deg attitude error and a 77% angular rate error. The third simulation was an even more severe 
test, starting with a rate 9 times the true attitude rate, which causes a 180 deg increase in the attitude error 
during the filter’s 10 second sampling period. 
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The BSEKF was compared with an EKF and an UF. 30 In the case with moderate initial errors, the 
BSEKF was tested with m ta rget values of 10, 20, and 30. The best results for this case were returned by 
the UF and the BSEKF with m target = 30, which both converged to attitude errors of less than 2 deg after 
about 1 hour of operation with the BSEKF being slightly more accurate. The BSEKF with m ta rget = 20 was 
about 41% less accurate, and the BSEKF with m ta rget = 10 was about 850% less accurate. The performance 
of the EKF fell between that of the BSEKF with m target = 10 and with m ta rget = 20. The accuracy of the 
moment-of-inertia estimation for the different filters followed the same pattern. The poorer performance of 
the BSEKF with m ta rget = 10 compared with the EKF is unexplained. 

Based on these results, only m ta rget = 30 was used in the more stressing simulations, and the BSEKF 
performed as well in these cases as in first case. The EKF had attitude errors of 20 deg even after 2 hours in 
both stressing cases. The performance of the UF was between the EKF and the BSEKF in the second case, 
but it failed to converge at all in the third case. This is believed to be a result of a wrap-around discontinuity 
of the sigma points for the attitude representation used in the particular UF. 

The BSEKF’s performance improvements are obtained at the expense of between 165 and 175 times as 
much processing as the EKF. However, the computation time is only about 18-19 times that of the UF for 
this problem. The improved performance shows the potential of the BSEKF for nonlinear estimation. 


F. Deterministic EKF-Like Estimator 

Nonlinear attitude estimators have been developed based on control design techniques. 19-21 They have 
a structure similar to the EKF, but do not depend on the questionable assumption of white noise. One 
of these estimators has been applied to the attitude determination of the RADCAL satellite using GPS 
measurements . 2 1 

The quaternion kinematic model is given by Eq. (7), but Eq. (14) is replaced by 

u> = u> + {3 + BiWd (35a) 

/3 = B 2 w d (35b) 


where w d is a square-integrable six-component deterministic dynamic disturbance vector, and B\ and J9 2 
are known matrices. It is assumed that BiB% — 03 X 3, although this assumption is not necessary. The 
kinematics can be written with x = [q T f3 T ] T as 


x = f(x) + G(x)wd 


where 


f(x) = 


iE(q)(u>-/3) 

03xl 


and G(x) 


-l E (q ) B i 

Bi 


Continuous measurements are modelled as 


(36) 

(37) 


y = h(x) 4- D(x) w y (38) 

where h(x) is the known m - component output vector, w y is a square-integrable m- component deterministic 
measurement disturbance vector, and D is a known matrix function of x. The H ^ estimation problem is 
defined as follows. Let 

x = f(x) + K(x, t) [y - h(x)] (39) 

with initial value x(0) = x 0 , and define an error function z by 

z = C( x ) - Z T (x) x (40) 


where all the functions and matrices are smooth functions with appropriate dimensions. Then the Hoq 
estimation problem is to choose, for a given 7 > 0 and a given metric N, the function AT(x, t) for which 


||z(r)|j 2 dr <7 2 |^(xo, x 0 ) + ^ [l|w d ( r )|| 2 + ||w y (r)|| 2 ] dr^ 


( 41 ) 
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Standard arguments show that a sufficient condition for the solution of this problem is the existence of a 
non-negative function V(x, x, t) with V(x 0 , x 0 , t) = 7 2 N(x 0 , x 0 ) that satisfies 


J = dV/dt + (dV/dx) f(x) + (dV/dx) (f(x) + K(x, t) [h(x) - h(x)]} + z T z 
+ [(dV/dx) G(x)G T (x) (dV/dxf + (dV/dx) K(x, t)R(x)K T (x, t) (dV/dxf] < 0 

where R(x) = D(x)D T (x). 

For the attitude estimation case', we take £(x) = [0ix3 P T ] T and 


Z(x) = 


so that 


z = 


S(q) 0 4X 3 
03x3 hx3 


-5 r (q)q 

(3-P 


(43) 


(44) 


The first three components of z are the vector part of q<g> q 1 , the estimation error quaternion. 

A significant amount of algebra using some small-error approximations shows that Eq. (42) is satisfied 
for 

V(x, x, t) = 7 2 z t P _1 z (45) 

with the gain matrix 

|tf T (q)l 


K(x, t) = Z(x)P 


C*3> 


R 


-l 


(46) 


where H( q) is the m x 3 sensitivity matrix, which is a function of the estimated quaternion, and the 6x6 
matrix P satisfies the Riccati-like equation 


-P + F(x) P + PF t (x) + P 


\- 2 hx 3 - H T (q)R- l H(q) 

O 

w 

X 

CO 

1 

P + 

'\Qx 

03x3 

03x3 

7 " 2 / 3 x 3 . 

03x3 

Q2 _ 


with 


F(x) = - 


K«-/§)X] ^3X3 

03x3 03x3 

Qi = B\B t and Q 2 = B 2 B 2 


< 0 (47) 

(48a) 

(48b) 


The scalar 7 is a tuning parameter for this estimator. It can be seen that a smaller value of 7 makes it more 
difficult to satisfy Eqs. (41) and (47), and these equations cannot be satisfied at all if 7 is chosen too small. 
As 7 becomes infinitely large, on the other hand, the equality limit of Eq. (47) goes to the usual Riccati 
equation of the Kalman filter, and the deterministic estimator becomes the continuous measurement limit 
oftheMEKF. 6 

G. Interlaced Kalman Filter 

The basic idea of the interlaced Kalman filter 27 ' 29 (IKF) is to replace a nonlinear model by two pseudo-linear 
models and to estimate the state by using two interlaced linear Kalman filters running simultaneously. The 
application of interfaced filters to spacecraft has been to smooth noisy gyro measurements 27, 28 or to estimate 
attitude rates in the absence of gyro data. 29 

These applications begin with the dynamics equation for a rigid spacecraft equipped with reaction 
wheels 92 

Ju + h + + h) = T (49) 

where J is the moment-of-inertia tensor, h is the reaction wheel angular momentum, and T is the external 
torque, which includes disturbances and magnetic and propulsive control torques, but not reaction wheel 
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torques. It is most convenient to analyze the dynamics in the principal axis frame, in which J is diagonal 
and 


w — dJ '(hx]w + G u [W 2 W 3 CJ1U3 UJ 1 LJ 2 ] + 3 1 (T — cij) 


(50) 


with 


G w — 


- J33) 

0 

0 


0 

3 22 1 {33 3 — 3n) 
0 


0 

0 


3 sz( 3 \\ - J22) 


(51) 


This is a nonlinear equation in u), but the interlaced filter converts it into a linear equation by estimating 
X = [a' 2^3 W] W 3 u>iu 2 ] T in a second Kalman filter, and writing 


ci> — J 1 [hx]u> + G u x + 3 *(T — u>) 


(52) 


where x is the estimate obtained in the second filter. The dynamic equation of this filter is 


X = G x u> 

where Cj is the estimate obtained by the first filter and 


0 

0J3 

d) 3 

0 

CJ 2 

U>1 


UJ2 

U>1 

0 


(53) 


(54) 


is given as a nonlinear function of cj by Eq. (50). The IKF comprises two linear Kalman filters running 
simultaneously, one with dynamics given by Eq. (52) to estimate the attitude rate vector cj, and the other 
with dynamics given by Eq. (53) to estimate the products of the components contained in the state vector 
X- 

Reference 27 treated the case of noisy gyro measurements, for which the measurement inputs to the 
two interlaced filters are trivially obtained from the gyro outputs. Both the IKF and the EKF were very 
effective in filtering out the measurement noise for this problem, but the attitude rate estimation errors of 
the interlaced filter were about 25% less on the pitch axis and about 33% less on the roll and yaw axes. The 
interlaced filter ignores correlations in the process noise and the measurement errors in the two filters, but 
it appears that this approximation is less problematical than the linearizations in the EKF. 

Reference 29 treated the more challenging gyroless case, for which rate estimates must be obtained by 
differentiating vector measurements. Differentiating the noiseless equation b* — gives the well-known 
equation 

b* = -cj x bi -f Ar z (55) 

This can be written as a measurement equation for cj 


Yi = CiCj 


(56) 


with 

Yi = b* - Aii and C* = [b*x] (57) 

If two vectors are measured simultaneously then a pointwise estimate of cj can be computed by 


'ci 

t 

yi 

C 2 _ 


/2_ 


(58) 


where the dagger denotes the Moore-Penrose pseudo-inverse. Note that cj is a “measurement” to be used in 
the filter. The generalization to more than two vectors is obvious. Producing inputs to both of the interlaced 
filters from cj can smooth this estimate in the same way that Ref. 27 smoothes gyro measurements. 

Equation (56) can also be used as a measurement for the linear filter for cj. There is no other option if 
only one vector at a time is available. Complications arising from the fact that C* is rank-deficient can be 
avoided by deleting one component of and the corresponding row of Ci. There are two options for obtaining 
measurements for the x filter. The simplest is to simply use products of the angular rate estimates from the 
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other interlaced filter. Independent measurements can be derived at the expense of more computations by 
using second time derivatives of the vector measurements. 

The interlaced filter employing first and second derivatives of magnetometer and sun sensor measurements 
was tested using flight data from the Rossi X-Ray Timing Explorer (RXTE) spacecraft. Gyro data were 
available to assess the filter performance, but were not used in the filter. The data span analyzed included 
a 90 deg maneuver around the z axis at a rate of 0.125 deg/sec. Magnetometer data were available for the 
entire data span, but sun sensor data became available only 204 seconds after the start of the maneuver. 
The maximum rate estimation errors were about 0.03 deg/sec in all three axes with magnetometer data 
only. When sun sensor data became available, the errors about £ axis remained about the same, because 
the sun vector was very close to this axis, but the errors about the x and y axes were reduced by an order of 
magnitude. Adding the cross product of the sun vector and the magnetic field vector as a third measurement 
reduced the z axis errors to 0.004 deg/sec when the sun was in view, and adding the cross-product of the 
magnetic field vector and its time derivative when the sun was unavailable decreased the x and y errors 
to about 0.005 deg/sec during these periods. The details are in Reference 29, which did not include a 
comparison with the EKF for this problem. 

Two other approaches that do use an EKF to estimate angular rates are presented in Ref. 93. The first 
uses differentiated quaternion measurements to yield a coarse rate measurement, which are then fed into two 
different estimators. The other approach using raw quaternion measurements that are fed directly into two 
estimators. The nonlinear part of the rotational dynamics equation, shown in Eq. (49), is decomposed to 
produce a product of an angular-rate dependent matrix and the angular-rate vector itself, which can be used 
in a pseudo-linear Kalman filter. Results using data from the RXTE spacecraft are also shown in Ref. 93. 


IV. QUEST-Based Methods 


Filter QUEST and its variants are based on Wahba’s problem, 94 which is the problem of finding the 
proper orthogonal matrix A that minimizes the loss function 

1 m 

J(A) = -'£ a i \\b i -Ar i f (59) 

1=1 

where a* are non-negative weights and m is the total number of measurements. Writing the loss function as 

J(A) = A 0 — tracers 2 ") (60) 


with 

m 

A 0 = X a * ( 61 ) 

i—l 

and 

771 

£ = X aib,rf (62) 

i=i 

makes it is clear that J(A) is minimized when trace(AB T ) is maximized. This is equivalent to the orthogonal 
Procrustes problem 95 of finding the orthogonal matrix A that is closest to B in the Frobenius (or Euclidean, 
or Schur, or Hilbert-Schmidt) norm, with the proviso that A have the determinant +1. Shuster pointed 

out that the nine components of the attitude profile matrix B contain full information about the three 

attitude degrees of freedom and the six independent components of the angular error covariance matrix, and 
that choosing the weights to be inverse variances, a* — <j ~ 2 makes Wahba’s problem a maximum likelihood 
estimation problem. 96 

There are many algorithms for solving this problem, 97 of which the most useful are Davenport’s q 
method 98 and QUEST. 23 Davenport parameterized the attitude matrix by a unit quaternion, as shown 
by Eq. (4), giving 

trace(AB T ) = q r A"q (63) 

where K is the symmetric traceless matrix 


B + B T — trace(J5)/ 3>< 3 YX= l a <** x r» 
a i*>i x trace (5) 


m 

-^aMborfa) 

t=l 


( 64 ) 
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The relation on the right-hand-side of Eq. (64) can be derived by using the matrix identities in Eqs. (7) and 
(9). Note that the matrices ft and T commute, which has some useful properties." The optimal attitude 
is represented by the quaternion maximizing the right-hand-side of Eq. (63), subject to the unit constraint 
1 1 q| | = 1 . It is not difficult to see that the optimal quaternion is equal to the normalized eigenvector of K 
with the largest eigenvalue, i.e. the solution of 

^Qopt = ^maxQopt 

With Eqs. (60) and (63), this gives the optimized loss function as 

J{A opt ) = Ao-A max (66) 

The very efficient QUEST algorithm is based on Shuster’s observation that A max can be easily obtained by 
a Newton-Raphson iteration starting from Ao as the initial estimate, since Eq. (66) shows that A max is very 
close to A 0 if the optimized loss function is small. In fact, a single iteration is generally sufficient. QUEST 
is less robust than Davenport’s q method in principle, but has proved itself many times over in practical 
applications. 


A. Filter QUEST 

Since a filtering algorithm is usually preferred when observations are obtained over a range of times, Shuster 
proposed the filter QUEST algorithm, 24 based on propagating and updating B : 

mk 

B(tk) = M$3x3 (tk, tk-i)B(t k -i) + flibjrf (67) 

i=l 


where $ 3 x 3 (£fc, tk- 1) is the state transition matrix for the attitude matrix, (x < 1 is a fading memory factor, 
and mfc is the number of observations at time tk • The optimal attitude at time tk is found from B(tk) by 
the QUEST algorithm. Shuster also formulated a smoother on the same basis. 

An alternative sequential algorithm, recursive QUEST or REQUEST, 25 propagates and updates Daven- 
port’s K matrix by 

TTlfc 

K(tk) = /^$4 x4(tki tk-i)K{tk— i)^4x4(^’ tk—i) T ^ ctjKj (b8) 


where $ 4 X 4 (tfc, tk- 1 ) is the quaternion state transition matrix and iQ is the Davenport matrix for a single 
observation: 


b»rf + r;b'f - (b'f ri)J 3x3 (b t x r*) 


K t = 


(69) 


(bt x r i) T 


bf r, 


Filter QUEST and REQUEST are mathematically equivalent, but filter QUEST requires fewer computations. 
Neither has been competitive with an EKF in practice, largely due to the suboptimality of the fading memory 
approximation to the effect of process noise. Computing the fading memory factor by a Kalman-gain-like 
algorithm gives better performance, but sacrifices much of the attractive simplicity of this method. 100 


B. Extended QUEST 

Extended QUEST 26 is an algorithm that solves for the attitude along with additional parameters. This is 
accomplished by finding the attitude quaternion q^ and the vector of auxiliary filter states Xfc, along with 
x/c-i, and the process noise vector Wk-i that minimize the loss function 


J = ^ y>r 2 ||-bi - A(q fc )ri|| 2 + -||i?ww(fc-i)W fc _i|| 2 + — 1| -R qq (fc- 1 ) (qfc - 1 - qfc-i)|| 2 


i=l 

+ — q*-i) + ^xx(fc-i)(xfc_i - Xfc_i)|| 2 

subject to the attitude dynamics equation 

qfc = $(ifc, tk- 15 qfc-i, Xfc_!, Wfc_i)qfc_i 


(70) 


( 71 ) 
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the transition equation for the auxiliary filter states, 

Xfc = f x(tfc, tk- 1 ; qjfe- 1 , x fc _i, w fc _i) (72) 

and the norm constraint ||qfc|| = 1. The vectors qk-i and x/t-i are the a posteriori (or best) estimates of q 
and x at sample time tk- 1 , and the various R matrices are weights in the loss function. 

The minimization employs an extended square-root information filtering algorithm 91 that proceeds in two 
stages per sampling period. The first phase dynamically propagates the a posteriori estimates at stage k — 1 
to compute a priori estimates at stage k. The propagated state estimates use the full nonlinear propagation 
and the mean value, zero, of the process noise, as in the EKF: 

qfc = tk- 1 ; qfc-i, x fc _i, o)q fc _i (73) 

and 

Xfc — f ij q/k-i, Xfc_ 1 ? 0) (74) 

The result of the propagation step is a modified form of the loss function 

2 _ i i 

j (qfc; x fc) = 2 a j ll^i ^(Qfc)**i || + 2 — Qfc)ll T — ||i^ X q(fc) (qfc “ Qfc) + ^xx(fc) ( x fc ~ Xfc)|| (75) 


where the R matrices are obtained by a QR factorization in the propagation step that employs a linearization 
about the a priori estimates at stage k — 1 as in an EKF. Thus Eq. (75) is an approximation to Eq. (70) if 
the dynamics are nonlinear, but is exact for linear dynamics. 

The second phase, the measurement update, is the novel part of extended QUEST. The optimum Xfc is 
easily given by 

( x fc)opt — x fc ~~ ^xx(fc)*^ x n(fe) (Qfc Qfc) (76) 

Substituting this in Eq. (75) and using Eqs. (60), (62) and (69) gives 


^(Qfc, xjt) = -q£ Q ^ a i 2 Ki j qfc + ~ # qq (fc)(Qfc - qfc)] [tf qq (fc)(Qfc “ qfc) 


\i= 1 


(77) 


Minimizing this loss function gives the best estimate qfc. The minimization differs from Wahba’s problem 
because of the linear terms in qfc. Substituting qfc into Eq. (76) gives 

x fc — Xfc £ xx( fc)iW)(qfc Qfc) (78) 


The remaining step is to express the optimized loss function in the form of Eq. (70) to prepare for the next 
step in the recursion. The details are in Ref. 26. 

This algorithm was tested in a simulation of a rigid spacecraft in near-earth orbit equipped with a star 
tracker and a rate gyro. Eulerian dynamics were modelled with gravity-gradient torques and white-noise 
disturbance torques. In one scenario, the spacecraft was spinning and nutating with spin period of 50 sec 
and nutation period of 504 sec. The other simulation was of a nadir- pointing spacecraft undergoing libration 
at roughly orbit frequency. The extended QUEST algorithm was shown to perform significantly better than 
the EKF, at about 2.6 times the computational expense. 


V. Two-Step Attitude Estimator 

The two-step optimal estimator was proposed in 1995 as an alternative to the standard EKF. 101 An 
implementation of the two-step optimal estimation for recursive spacecraft attitude estimation is presented 
in Ref. 102. Simulations show dramatic improvement of the two-step attitude estimator over the traditional 
EKF and the extended QUEST. 102 

The general cost function of the two-step optimal estimator is given by 
1 k ~ l 

Jk = 2 E \[y <+1 “ h <+1 (x i+1 )f K+i [y»+i - h i+ i(x i+ i)] + wfQ^Wi} 

z i - o (79) 

+ l(xo - xo) T P 0 _ 1 (x 0 - x 0 ) 
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subject to the dynamics equation 


Xi + i = fi(x<, w<) for i = 0, 1, . . . , k - 1 (80) 

The process noise covariance is Qi , y^+i is the measurement with covariance Ri+i, hi+i(xi+i) is the nonlinear 
measurement model, and xq is the a priori estimate of the state with covariance Po. The state estimate 
that minimizes the above cost function is the MAP estimate if the process noise, the measurement noise, 
and the initial estimate error satisfy the Gaussian assumption. 

The main point of the two-step optimal estimator is to define a first-step state y = F{x) in which the 
nonlinear measurement model is linear, i.e. y = h(x)+v = Tiy ~ f v, where v is the measurement noise, so that 
a linear measurement update of the first-step state can be applied. The first-step state y relates the desired or 
second-step state x and the measurement y through the nonlinear mapping T and the linear “measurement” 
matrix H , respectively. All the information contained in the initial guess and the measurements is fused in 
the first-step state using a Kalman filter that performs a nonlinear and sometimes higher-order time update 
and a linear measurement update. The initial mean j^o and covariance Vo of the first-step state can be 
obtained from x 0 and Po, e.g. using a Monte Carlo approach. In general, (x 0 — x 0 ) r P 0 ~ 1 (xo — x 0 ) is not 
equivalent to (^o — To) r 7 > o~ 1 (3 ; o - 5^) because of the nonlinear relationship between x and y. The desired 
state estimate x^ is obtained from the first-step state estimate yk and its error covariance matrix Vk as the 
minimum of the cost function 


j* = l [y* - ^(x)] T v [& - ^(x)] (si) 

The constraints in Xk may also be included by use of the Lagrangian multiplier method. A numerical 
least-squares algorithm such as the Gaussian-Newton or Levenberg-Marquardt method is used to solve the 
second-step minimization problem. Note that in order to guarantee the uniqueness of the solution, the size 
of the first-step state should in general be larger than or equal to that of the desired state and the covariance 
matrix of the first-step state should be positive definite. When there is no dynamics, i.e. x/- + i = x*, or 
initial guess, Eq. (79) reduces to 

Jk = | [y< ~ Mx)] r R - 1 [y i - hi(x)] ( 82 ) 


The two-step attitude estimator for this special case amounts to 1) formulating the cost function in terms of 
Yi and R z in an equivalent form in terms of and Vk that are obtained with a linear least-squares scheme 
and 2) obtaining Xk by solving the converted cost function. The first-step estimate yk and the covariance 
matrix Vk are simply an equivalent representation of all the measurements. 

The desired or second-step state of the two-step attitude estimator is the attitude quaternion (for attitude- 
only estimation) or the attitude quaternion and the gyro biases (for attitude and gyro bias estimation). The 
assumed attitude measurement model is the unit vector model, which is independent of the gyro biases, 
linear in the attitude matrix, and quadratic in the attitude quaternion. With this measurement model, the 
first-step state for attitude-only estimation can be chosen as 

y = [An, A12, A13, A21, ..., A 33 ] t ( 83 ) 


where A i3 , i,j = 1,2,3 are the elements of the attitude matrix. The measurement matrix H that relates y 
in Eq. (83) to a unit vector observation is given by 


r T 

0lx3 

0 

x 

GJ 

*1x3 

T 

r 1 

0lx3 

hx3 

0lx3 

T 

r 1 


(84) 


where r is the representation of the unit vector in the reference frame. The first-step state is expanded to 
include the gyro biases when both the attitude and the gyro biases are to be estimated. 

The attitude part of the augmented first-step estimate may be understood as the usual arithmetic mean 
of the attitude matrix in the space of 3 x 3 matrices, which is not an orthogonal matrix in general. As 
Kasdin and Weaver noted, this first-step estimate is not to be used as an estimate of attitude because the 
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six constraints in the attitude matrix have not been accounted for. 102 The purpose of the first step is to 
optimally filter the vector measurement sensor noise. It is not until the second-step minimization that the 
best attitude estimate (as a unit quaternion) becomes available. The details of the second-step minimization 
of the two-step attitude estimator are in Ref. 102. 

The dynamic propagation equation of the attitude part of the first-step state, y in Eq. (83), is given by 


where fi(cj) is given by 


fi(w) = 


y(t) = si(oj)y(t) 


(85) 

03x3 

^3^3x3 

”<^2^3x3 


“W 3 I 3X 3 

03x3 

^1^3x3 

(86) 

^2^3x3 — ^1^3X3 

o 

CO 

X 

CO 



Because the attitude quaternion does not appear in the above equation, the corresponding time update of 
the first-step estimate is not a function of the unit quaternion estimate. Therefore, there is no need to find 
the unit quaternion estimate in order to process the filter. Only when the attitude estimate is desired does 
the second-step minimization need to be computed. In most of the two-step optimal estimators, however, 
the dynamics for the first-step state is explicitly dependent on the second-step state in a nonlinear fashion. 
Consequently, the time update of the first-step state is complex and computationally expensive. 102 

When the angular rate u) in Eq. (85) is perfect known, only a linear Kalman filter needs to be implemented 
for the first-step state, i.e. y in Eq. (83). When the angular rate is obtained from noisy gyro measurements 
and both the attitude and the gyro biases are to be estimated, the augmented first-step state dynamics 
becomes nonlinear and a nonlinear time-update scheme should be implemented for the first-step state because 
of the state-dependent process noise and the product terms of the entries of the attitude matrix and the 
gyro biases in the dynamic propagation equation. 

The two-step attitude estimator for attitude-only estimation consists of: 


1 . Time update of the first-step estimate as a nonorthogonal matrix. 

2. Measurement update of the first-step estimate as a nonorthogonal matrix. 

3. Second-step minimization for the unit quaternion or the orthogonal attitude matrix (run on demand). 

It has similar properties with the recursive attitude estimation algorithms in Ref. 87, although the gyro 
measurement and vector measurement models used therein are slightly different. In Ref. 87, the algorithm 
without orthogonalization corresponds to the first two steps only; the algorithm with orthogonalization 
corresponds to the first two steps followed by an iterative orthogonalization procedure, which is run at every 
measurement update. The convergence of the two algorithms in Ref. 87 is not always assured when they are 
initialized with the identity matrix. Whether the two-step attitude estimator has guaranteed convergence in 
the same scenario is unclear. 


VI. Unscented Filtering 

The Unscented filter 30 ’ 31 works on the premise that with a fixed number of parameters it should be easier 
to approximate a Gaussian distribution than to approximate an arbitrary nonlinear function. The filter is 
derived for discrete-time nonlinear equations, where the system model is given by 

Xfc+i = f(x fc , k) + w fc (87a) 

9 k = h(x fc) k) + v fe (87b) 

Note that a continuous-time model can always be written using Eq. (87a) through an appropriate numerical 
integration scheme. It is again assumed that and v fc are zero-mean Gaussian noise processes with 
covariances given by Qk and respectively. The Kalman gain and updated covariance are rewritten in 
the form given by 103 

Kk = Pk y {PD~ l (s&O 

Pit = P k - KhPk’Kl (88b) 
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where and P £ are the updated and propagated covariances, respectively, P£ v is the covariance of the 

innovations process yfc — h(xj~ , k) with denoting the propagated state, and P£ y is the cross-correlation 
matrix between x^ and h(x^,fc). The general formulation for the propagation equations and computation 
of the gain matrix is given by computing the following sigma points: 


(?k 


2 n columns from ±7 y P£ 

Xk( 0) = x* 

Xk(i) = <x fc (i) +x+ 


(89a) 

(89b) 

(89c) 


where 7 is a design parameter and y P£ denotes the matrix square root of P £\ 95 Due to the symmetric 
nature of this set, its odd central moments are zero, so its first three moments are the same as the original 
Gaussian distribution. The transformed set of sigma points are evaluated for each of the points by 

Xk+i{i) = i(Xk{i),k) for i = 0, 1, . . . , 2n (90) 


The predicted mean at times tk + 1 for the state estimate is calculated using a weighted sum of the points 
Xk+ i(*)> which is given by 

2n 

*fc + i = E w r an xfc+i(o (9i) 

i~ 0 

where W™ ea,n is a weighting parameter. The output covariance and cross-correlation are computed by 

2 n 

p kl i = 53 w i° v - y£+i] hk+i (*) - yr+i3 T 

i~0 
2 n 

P fe+i = 53 ^ OV [Xfe+iW - *fc+i] hk+i(i) - Yfe +1 ] T 

i~ 0 

where W? ov is another weighting parameter and 

7fc+i(*) = MXfe+iW, k + 1) (93) 

Since the measurement noise appears linearly in Eq. (87b), then = Pk'+\ + p k+i- Methods to han- 
die process noise for the computation of the propagated covariance and to handle nonlinearly appearing 
measurement noise are discussed in Ref. 31. 

As with the standard EKF, using the UF directly with a quaternion parameterization of the attitude 
yields a nonunit quaternion estimate, as seen by Eq. (91). To overcome this problem an unconstrained 
three-component vector is used, based on the generalized Rodrigues parameters 77 (GRPs) to represent an 
attitude error quaternion. 33 The algorithm is called the UnScented QUaternion Estimator, or USQUE. The 
state vector includes the attitude error and bias vectors: 


(92a) 

(92b) 


Xk( o) = X+ = 



(94) 


where the estimated error-GRP Spk is used to propagate and update a nominal quaternion. This estimate, 
as well as the corresponding sigma points, can be used to form error quaternions, denoted by Sq^ (i), through 
a simple transformation from GRPs to quaternions. Then, the following quaternions are computed: 


qfc(o) = q£ 


(95a) 

(95b) 


A reset of the attitude error to zero after the previous update is required, which is used to move information 
from one part of the estimate to another part. 7 This reset rotates the reference frame for the covariance, so 
we might expect the covariance to be rotated, even though no new information is added. But the covariance 
depends on the assumed statistics of the measurements, not on the actual measurements. Therefore, since 
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the update is zero-mean, the mean rotation caused by the reset is actually zero, so the covariance is in fact 
not affected by the reset. The quaternions in Eq. (95) are propagated using Eq. (7) with the estimated 
angular rate. The propagated error-quaternions are then computed using 

= qfc+i(*)® [qfc+i(°)] _1 > i = o, 12 (96) 

Note that <5q^ +1 (0) is the identity quaternion. Finally, the propagated error-GRPs are computed using a 
simple transformation from quaternions to GRPs. The error-GRPs can be propagated directly, however 
this approach requires the integration of nonlinear equations. The advantage of converting the GRPs to 
quaternions is that a closed-form discrete-time solution exists for Eq. (7). 

A comparison between the standard EKF approach of Ref. 6 and the USQUE algorithm has been shown 
in Ref. 33. The simulation involves determining the attitude of a low-Earth pointing spacecraft from mag- 
netometer and gyro measurements only. With good initial attitude estimates, both USQUE and the EKF 
attitude errors agree to within 1 //rad. This indicates that no advantages to using USQUE with small errors 
can be seen. Another simulation involved large attitude errors of —50°, 50° and 160° for each axis, respec- 
tively, with initial bias estimates set to zero. The EKF takes almost 8 hours to converge to a value below 
0.1°, while USQUE converges to this value in under 30 min. Also, the EKF attitude errors do not converge 
to within their respective 3cr error bounds until well after 8 hours. However, the USQUE attitude errors 
do converge to within their respective 3 a well within 8 hours. Another case is shown where the EKF never 
converges, while USQUE does. 

VII. Particle Filters 

There is no such thing as “the PF,” just as there is no such thing as “the EKF.” 104 Particle filters 
comprise a very broad class of suboptimal nonlinear filters based on sequential Monte Carlo simulations 
in which the distributions are approximated by weighted particles (random samples) that are generated 
using pseudo-random number generators. The computational expense and attainable estimation accuracy 
of PFs vary greatly. Numerous theories, improving strategies, and applications of the PFs can be found in 
Refs. 34, 35 and 105. 

A PF allows for a more general discrete-time state-space model than the UF, given by 

Xfc+1 = ffc(x fc ,Ufe,Wfc) (97a) 

y fe = h it(x fe ,v fe ) (97b) 

The process noise w* and the measurement noise are assumed to be white noise processes. The dis- 
tributions of x 0 , wjk, and v*, denoted by p(xo), p(wfc), and p(v*;), respectively, are assumed to be known 
and mutually independent. No Gaussian assumptions about the noises are required. The probabilities 
p(xfc + i|xfc) and p(yjk|xfc) can be derived from the above model and are assumed to be available for sampling 
and evaluation. 

The empirical, discrete approximation of the posterior distribution p(x*;|Yfc) with N weighted particles 

{ x fc ) >' u 4 l) }iii is s iven b y 

N 

Pjv(dx fc |Y/b)« ^u>jj. l) £ x (i)(dxfc) (98) 

i = 1 

where xj^ are the particles drawn from the importance function or proposal distribution, are the 
normalized importance weights, satisfying Y^iLi w k^ ~ 1> and ^ x (*> (dxjt) denotes the Dirac-delta mass located 

in xj^. We use Xfc and Y* to denote the state trajectory {xj}* =0 and measurement history {yj}j-_i, 
respectively. The expectation of a known function f(x*) with respect to p(xfc|Y^) is then approximated by 

(99) 

For example, the approximation to the arithmetic mean of x* is - Crisan and Doucet showed 

that the upper bound on the variance of the estimation error has the form c 0(N~ l ), with c a constant, 105, 106 
whereas Daum argued that c in the upper bound depends heavily on the state vector dimension. 104, 105 
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A PF updates the particle representation (x^ , 
includes 105 


wl 


a recursive manner. A cycle of a generic PF 


• Sequential Importance Sampling 

- For i = iV, sample from the importance function <y(x& + i|Xj[. l \ Y& + i) 

- For i = 1, . . . AT, evaluate and normalize the importance weights 


(i) 

oc w 


« 


pCyfc+iIXfc+tMxfc+JXfcO 

^x^lX^.Yfe+x) 


.(0 | v (*)^ 


( 100 ) 


• Reampling: Multiple/Discard particles {x^| 1 }^ 1 with respect to high/low importance weights u^+x 
to obtain N new particles {x^| 1 }^ =1 with equal weights. 

A popular suboptimal choice of the importance function is g(xfc+i|x£\ Yfc+i) = p(x/c +1 |x^). The PF with 
this importance function is known as the bootstrap filter (BF). Sampling x^| x from p(x/fc+i|xj^) is equivalent 

to the dynamic propagation of x^ to time t k +i in the sampling process. The information contained in the 
measurement yfc+i is not employed. The corresponding update of the importance weight, Eq. (100), has a 
simple form, i.e. oc w^p(y k ^.i\x^ 1 ) y which is appealing especially when the evaluation of p(x^ +1 |x fe ) is 
difficult. Note that this choice becomes inefficient when the overlap between p(xfc+i|xj^) and p(yfc+i|xfc+i) is 
small. In contrast with this simple importance function, the optimal importance function p(x^ + i|x^\y^ +1 ) 
that minimizes the variance of the importance weight conditional upon xj^ and yfc+i fully incorporates 
the latest measurement y/c+i, but usually cannot be evaluated exactly or have samples drawn from it. 

The resampling step alleviates the inherent particle degeneracy of sequential importance sampling, but 
also reduces the number of distinct particles, which is often called the problem of particle impoverishment. 
Simple remedies for the impoverishment problem include roughening and regularization. 105 

Particle filters are superior to conventional nonlinear filters for difficult applications of nonlinear and 
non-Gaussian filtering. A frequently-cited illustrative example in the PF literature is repeated here 34 

x k+ i — + 25 , Xk 2 + 8 cos[1.2(fc + 1)] + w k (101) 

2 1+Zfc 

Vk = w +Vk (102) 

Although the process noise w k , measurement noise v k , and the initial error in the above model are inde- 
pendent Gaussian white noises, the posterior distribution of x k is highly non-Gaussian, and the minimum 
mean-square error (MMSE) estimate could be misleading. In fact, even if the posterior state distribution 
is (eventually) Gaussian-like, conventional filters such as the EKF are usually not guaranteed to estimate 
the mean and covariance correctly. On the other hand, PFs are known for being computationally expensive. 
A main issue of PFs is the computational complexity for high-dimensional systems. For a 100-dimensional 
linear and Gaussian system, a linear KF can give the optimal Bayesian estimate, but the simulation-based 
PF without employing the linear Gaussian structure of the system may be unable to produce useful results. 

Recent applications of PFs in spacecraft attitude estimation are reported in Refs. 36, 37, 107 and 108. 
The PFs employed therein are essentially the BF, in which the attitude is propagated through attitude 
kinematics or dynamics and the measurements are only used to update the importance weights. 

In Ref. 36, a simple BF was designed to simultaneously estimate the attitude and the gyro biases (or the 
attitude and the attitude rate in gyro-less applications). Using the uniform attitude distribution as the initial 
attitude distribution for the case of no initial attitude knowledge and a gradually decreasing measurement 
variance in the computation of the importance weights, it has good convergence properties. The design of 
the decreasing variance is heuristic though. This application showed that even a simple BF can be applied 
to tackle a nontrivial attitude estimation problem. 

In Refs. 37 and 107, the Genetic Algorithm-Embedded Quaternion Particle Filter (GA-QPF) is presented 
for attitude and gyro bias estimation. The adaptive version of the GA-QPF is given in Ref. 108, which 
estimates the measurement noise distribution on the fly, along with the estimation of the spacecraft attitude 
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and gyro biases. The noise distribution estimation scheme is based on an analysis of the filter-generated 
innovations process. 

In Refs. 37,107 and 108 the problem of attitude and gyro bias estimation is divided into two easier prob- 
lems. The resulting GA-QPF has an interlaced structure. The QPF that estimates the attitude quaternion 
for a given gyro bias estimate is interlaced with an external maximum likelihood estimator for the estimation 
of the gyro bias. The likelihood function for gyro bias estimation is approximated as 37 

*2 

II - 0)qj-i] (103) 

j=k i 

where $(•) is the quaternion transition matrix derived from the continuous-time quaternion kinematics and 
qj_! is obtained from the QPF. In the likelihood function, the gyro bias (3 is treated as constant over 
[tk^tki]- The gyro noise model does not need to be incorporated in the likelihood function. The GA 
algorithm maximizes the likelihood sequentially in time. 

The number of attitude particles is dramatically reduced by a simple initialization procedure of the QPF. 
The idea is based on the fact that the first vector observation defines a quaternion of rotation up to one 
degree of freedom. This degree of freedom is used to generate the initial set of particles from the first 
observation only. By so doing, the QPF starting with 1500 quaternion particles reduces the number to 200 
after the first two measurement updates. 

Unlike the filters that work directly with the state mean and covariance, PFs compute the mean and 
covariance as derived quantities of the particle representation The interactions between the 

mean and covariance as in the EKFs no longer exist in PFs, and the issues such as attitude representations 
and attitude error definitions are much less significant in PFs than in EKFs. Three ways of computing the 
attitude estimate have been proposed: 

1. As the usual arithmetic mean of three-component attitude representations such as the MRPs, 36 or 

2. As the attitude particle with the largest importance weight, 108 or 

3. As the minimum of the cost function 108 

£>i o ii ^v^ii 2 ( io4 ) 

i= 1 


subject to A T A — AA T = 73x3- 

The third one is the MMSE estimate in SO(3) and can be computed using the SVD method. 108 The metric 
in SO(3) associated with the cost function is given by 109 


d F (A u A 2 ) = \\A 1 -A 2 \\ (105) 

which is in general not identical with the Riemannian metric in SO(3) corresponding to the length of the 
shortest geodesic curve, given by 109 


dfi(Ai,^) = ||log(^ 2 )|| (106) 

The third one is superior to the other two but is also more computationally expensive. A problem with the 
second way is that the particle with the largest importance weight does not necessarily have the maximum a 
posteriori probability because the importance weight is defined as the ratio between the posterior distribution 
and the importance function. The choice of the importance function may have great impact on the maximum 
of the importance weights and the corresponding attitude estimate. The second way of computing the 
attitude estimate is also known to be inaccurate. The first choice is not invariant under rotations or attitude 
parameterization-independent; it treats the attitude particles in a three-component attitude representation 
like real vectors in the Euclidian space. This approximation is good when the attitude particles are locally 
distributed. 
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VIII. Orthogonal Attitude Filter 


The orthogonal filter 40 represents the attitude by an orthogonal rotation matrix, rather than by some 
parameterization of the rotation matrix. This avoids questions about singularities of representations or 
covariance matrices arising in other filters, and has the additional advantage of providing a consistent ini- 
tialization for a completely unknown initial attitude, owing to the fact that SO(3), the group of rotation 
matrices, is a compact space. 110 The PDF is a non-Gaussian function defined on the Cartesian product of 
SO (3) and the Euclidean space R Ar of bias parameters. The Fokker-Planek equation 39 propagates the PDF 
between measurements and Bayes 5 formula 1 incorporates measurement information. This approach is related 
to earlier work by Daum 38 and Lo. 11 1-1 13 

Three estimates are known to be identical for a Gaussian PDF defined on R^: the conditional expectation 

x M /„ = /xp xJ y v (x) d N x. (107) 


the MAP estimate 
and the MMSE estimate 


= argmax [p x „|Y„(x)] 


MMSE _ „ 

*u/v = argrnm 

x'€R* 


[ l]x-x'|| 2 p X(i |y u (x) d N X 

R N 


(108) 


(109) 


It is also well known that the Fokker-Planek equation for linear dynamics and Bayes’ formula for a linear 
measurement model lead to the usual Kalman filter with a Gaussian PDF. 

We define the non-Gaussian PDF of the attitude on SO (3) by 


Pa^yM) = exp [-Ja„\y v ( a )] 


( 110 ) 


where the negative-log-likelihood function is given by 


Jah\y v {A) = — trace^Jj^) -f constant (111) 

The real 3x3 matrix \ v has the correct number of free parameters to represent the mean and covariance 
of an attitude state, as in Wahba’s problem. In fact, a filter based on this PDF will look very much like 
filter QUEST. The MAP and MMSE attitude estimates for this PDF can be shown to be identical, but 
the conditional expectation of the attitude matrix is not an acceptable attitude estimate, since it is not an 
orthogonal matrix in general. 40 Equation (111) contains the first two terms, £ = 0 and £ = 1, of an expansion 
of the negative-log-likelihood function in the irreducible representations of SO (3). 114 An attitude PDF of 
this form was first considered by Lo, m “ 113 who referred to it as an exponential Fourier density. Lo included 
the higher-order £ = 2 term, but treated neither process noise nor non-attitude bias parameters. 

A PDF describing uncorrelated bias parameters obeying Gaussian statistics and an attitude with a PDF 
specified by Eqs. (110) and (111) would be the exponential of the sum of the right-hand-side of Eq. (Ill) 
and a quadratic function of the bias parameters. The simplest generalization to include correlations between 
the bias vector and the attitude matrix in the orthogonal filter is 


X) = - ^l\v x - trace [ S H„( x Ml + constant 


( 112 ) 


where 


N 

k = 1 


(113) 


where Xk is the fc th component of x. Propagation and update equations for the parameters and 

Bfi\ V) k for k = 0, . . . , N are derived from the Fokker-Planek equation and Bayes 5 formula, respectively. It is 
not easy to establish that the MAP and MMSE estimates are equal for the correlated problem, and it may 
not even be true. Therefore, the method uses the more easily computed MAP estimates, which are found 
by simultaneously satisfying the equations 


A^/y = argmin 

A€SO(3) 


[^A^,x^|y v (A £/i|v)] 


( 114 ) 
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and 


= arg mm 
xgir*' L 


{Ja^-x-hIySAh |v> x ) 

Defining a vector function f] fJL \ v (A) with components given by 

[nn\v(.A)} k = trace(Ej |vfc A) for k = 1, . 
allows us to rewrite Eq. (115) as 


..,N 


(115) 


(116) 


x M/u = argmin { \x T F*> v x - + Vn\v(^\v)] x - trace(BL 0 i^| v ) 

XGR N L z l j 

= [VVfr + 17 m|v(A*|v) 

The attitude estimate A^ v can be found by maximizing trace[£j| v (x).A] for some initial guess for x M |„, using 
one of the algorithms developed for Wahba’s problem, then updating x M | v using Eq. (117), and iterating this 
procedure until it converges. 

The Bayes measurement update equation and Fokker-Planck propagation equation for this filter were 
derived for an Earth-pointing spacecraft equipped with gyros and a three-axis magnetometer. 40 The filter 
developed an instability that is believed to result from the approximations in deriving the Fokker-Planck 
equation. Since this is a nonlinear equation, the £ — 1 term in the PDF leads to £ = 2 terms in the 
propagation, which cannot be accommodated in the filter. It is possible that including an £ = 2 term 
in the PDF would stabilize the orthogonal filter, at the expense of increased complexity. More realistic 
measurement models may also require the £ — 2 irreducible representation to be included in the PDF. If 
£ = 2 terms are included in the PDF, however, the nonlinear Fokker-Planck equation will introduce £ = 3 and 
£ = 4 terms that would have to be ignored. It appears that no PDF including a finite number of irreducible 
representations of SO(3) can provide an exact solution of the Fokker-Planck equation, and it remains to be 
seen if a consistent convergent algorithm of this type can be found. 


(117) 


IX. Predictive Filtering 

In the nonlinear predictive filter it is assumed that the state and output estimates are given by a prelim- 
inary model and a to-be-determined model error vector, given by 115 

x(f) = f [x(t)] + G[x(t)]d(t) (118a) 

y(t) = h[x(f)] (118b) 

where d(f) is the model error, which may include both model variations and external disturbances. A Taylor 
series expansion of the output estimate in Eq. (118b) is given by 

y(t + At) = y (t) + z[x(t), At] + A(At)S[x(f)]d(f) (119) 

where the i th element of z[x(t), At] is given by 





(120) 


where p iy i = 1, 2, . . . , m, is the lowest order of the derivative of hi[k(t)} in which any component of d (t) 
first appears due to successive differentiation and substitution for Xi(i) on the right side, and L 3 j(hi) is an 
i th order Lie derivative. 116 The matrix A(At) is diagonal with elements given by A t Pi /Pi\> and the z th row 
of S[x(£)] is given by 


SiW)} = {L gi [Lj-Hhi) 


i * * * » Lg q 


L v r\ h i) 


( 121 ) 


where q is the number of columns of G[x(t)] and L g . 
is in essence a generalized sensitivity matrix for nonlinear systems 


L V f 1 (h i ) is another Lie derivative. Equation (121) 
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A cost functional consisting of the weighted sum square of the measurement-minus-estimate residuals plus 
the weighted sum square of the model correction term is minimized to determine d(£). This yields 115 ’ 117 

d (t) = ({A(At)5[x(t)]} T J R- 1 {A(A i )5[x(i)]} + M/) _1 

x {A(At)S[x(t)]} T R- 1 {y (t + At) - y(t) - z[x(£), At]} 

where the matrix W serves to weight the amount of model error added to correct the assumed model in 
Eq. (118). As W decreases, more model error is added to correct the model, so that the estimates more 
closely follow the measurements. As W increases, less model error is added, so that the estimates more 
closely follow the propagated model. An optimal W can be computed using an output covariance constraint 
which is that the covariance of the measurement residual matches the actual measurement covariance in a 
statistical sense. 115 Equation (122) is used in Eq. (118a) to perform a nonlinear propagation of the state 
estimates to time t then the measurement is processed at time to find the new d(£) in [£/-, £*.+i]> and 
then the state estimates are propagated to time £fc+ 1 . 

The advantage of the predictive filter over the EKF for attitude estimation applications is that the 
linearization is performed at the output, not at the system dynamics. Therefore, the issue of quaternion 
normalization is never a problem in the predictive filter, since d (t) is used to propagate the quaternion 
kinematics directly. The predictive filter has been successfully tested using actual magnetometer and sun 
sensor data without gyros from the Solar, Anomalous, Magnetospheric Particle Explorer (SAMPEX) space- 
craft. 118 Another advantage of the predictive filter is that it can be used to provide a deterministic solution 
by setting W = 0. For this approach the matrix inverse in Eq. (122) exists only if the minimum number of 
observations is present, e.g. two star measurements for three- axis attitude determination. This approach has 
been used to provide algorithms for the GPS attitude determination problem 119 and for the simultaneous 
attitude/position determination problem from multiple line-of-sight observations. 120 These papers show that 
the predictive filter solution for each of these problems achieves the associated Cramer-Rao lower bound, 
which proves that the predictive filter algorithms are efficient estimators. 

X. Nonlinear Observers 

Many nonlinear observers exist that provide attitude and angular rate estimates. 42-44,46 " 48 Each has its 
various advantages and disadvantages. In this section we focus our attention to the observer designed by 
Thienel and Sanner, 48 since it is the most recent of the aforementioned references. In their approach, the 
measured angular rate of Eq. (14a) is rewritten as 


u> = cj + (3 


(123) 


where the vector /3 is now assumed to be constant. The estimate angular rate is given by <2* = to — /3, 
where /3 is the estimated bias. Observations of the true quaternion, q, with no noise are assumed. The error 
quaternion between the “measured” quaternion and the estimated quaternion follows Eq. (20) with 


Sq = 


SQ 

$<14 


= q® q 


-l 


The nonlinear observer for the quaternion and bias is given by 

q= 1 H(q)A T ( 5 q) [( 2 > + fc<S e sign(< 5 g 4 )] 

P = -^0sign(<fy 4 ) 

where k is any positive constant. The error- dynamics of the observer can be shown to be given by 

<5q = -^H(5q) [A/3 + k <5£sign(<5<j 4 )] 

A/3 = i<50sign(<5 94 ) 


(124) 

(125a) 

(125b) 

(126a) 

(126b) 
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where A/3 = (3 — 0. The equilibrium states for Eq. (126) are <5q = [0 0 0 ± 1] T and A/3 =[0 0 0] T . 

In order to prove global stability of the observer, the following candidate Lyapunov function is chosen: 


v = l AB T A3 l ({5 q 4-l) 2 + de T Sg, Sq 4 > 0 

2 2 1(5^4 + l) 2 + 5q t 5q, Sq 4 <0 

Taking the time derivative of Eq. (127) and using Eq. (126) leads to 


(127) 


V = A8 e T S e . (128) 

for all t. This establishes that A/3, Sq and £<74 are globally uniformly bounded. The second derivative is 
given by 

V = ^ Sg T (Sq 4 hx 3 + [5gx]) [A/3 + k80sign(5q 4 )) (129) 

which is also bounded. Barbalat’s lemma 121 then shows that \\Sq\\ — > 0 as t oo. 

Simulation results in Ref. 48 have shown that the nonlinear observer works well. Thienel and Sanner also 
have considered the effects of using the more realistic gyro model given by Eq. (14). The bias estimates can 
be shown to be exponentially convergent to a root-mean-square (RMS) bound given by 


II A/3||rms < V3c yj -a 2 + a 2 (130) 

where c is a finite constant which depends on the spacecraft inertia and desired rates. Coupling the observer 
with a nonlinear controller gives attitude tracking errors that converge asymptotically to the RMS bound 


||5p||rms < 




(131) 


where kp and A are control law gains, and 7 is a bounded variable. Simulation results with noise show that 
the errors are within a ball determined by the variance of the noise. 

Nonlinear observers are especially useful since they are often accompanied with global stability proofs. For 
spacecraft attitude estimation applications having the property of guaranteed convergence from any initial 
condition is especially desired by designers. Still, nonlinear observers are still in their infancy. The observers 
in Refs. 42-48 all require an attitude measurement, which limits their use to cases where a deterministic 
attitude is known. Still, these methods show great promise for future applications. 


XI. Adaptive Methods 

Many adaptive methods exist that either update noise covariances in a filter design, 17,50-52 or update 
model parameters through least-squares techniques 54-57 or by using nonlinear techniques. 58-60 In this section 
we focus on a noise adaptive approach using a quaternion Kalman filter, 17 and a nonlinear adaptive approach 
to determine both inertia parameters and constant disturbance. 58 


A. A Noise Adaptive Approach 


The adaptive filter described in this section is based on a linear pseudo-measurement, given by Eq. (25). 
The adaptive filter processes the measurement residuals to optimally compensate for system errors. The 
case of process noise adaptive estimation is considered in Ref. 17. The process noise covariance matrix is 
assumed to be a scalar times identity matrix = 77/3x3, where 77 is the to-be-estimated parameter using 
an adaptive scheme. The process noise covariance is initialized using Qk = /3x3- Denoting the sensitivity 
matrix that multiplies q in Eq. (25) as H , the i - step residual is given by 

v i = qj. (132) 


for i = l, 
estimate, 
predicted 


2, . . . , m, where H x k is computed from a single measurement and is the propagated quaternion 
Note that Eq. (132) implies that the pseudo-measurement is zero. 17 The sample mean of m 
residuals is defined by 



(133) 
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The squared residual sample mean, denoted by the matrix M m is computed as 

1 m 

= (134) 

m , — , 

and the covariance of u m is denoted by S m . The adaptive procedure determines 77 that solves the following 
minimization problem: 

rnin{J(r 1 ) = \\M m -S m (v)\\ 2 } (135) 

? 7>0 

where the norm || • || denotes the Probenius norm, i.e. ||M || 2 = trace(M M T ). The solution to this mini- 
mization problem is given through a series of steps. First, the following variables are initialized: q£ = q£, 
V k = P£ i = 04 X 4 , = 04 x 4 and M Q = 04 x 4 , where q£ and P k are the updated quaternion estimate 

and covariance, respectively. Then, the following steps are given, starting with % = 1 : 

1. Compute H(qJ. _1 ), where 5 is defined by Eq. (5a). 

2 . Propagate the quaternion and covariance using 


4 = (136a) 

n = ($r l )^r 1 (C 1 ) T (iseb) 

where is the state transition matrix from the quaternion kinematics. 

3. Compute S(q£) and v\ from Eq. (132), along with the following variables 

M i = — -M l ~ l + iK)K) T (137a) 

l % 

n = i[H(q* fe )]^[H(q* fc )] T + al 4x 4 (137b) 

Ay-2 

2* = {($r 1 )[H(qr 1 )][E(qr 1 )] T ($r 1 ) T + [H(qi)][H(q;:)] r } — (137c) 

M* = M i " 1 + (HDVtmf + Hi (137d) 

V = L*- 1 + {H i k )Z i {H i k ) T (137e) 


where R l k is the measurement covariance for the 7 th measurement and a is a small number in order to ensure 
that 1Z\ is nonsingular. The procedure continues through i < m. The estimate for 77 , denoted by 77 , is 
computed by 

- _ trace [(M m - A4 m )(L m ) T ] 

^ trace [(L m )(L m ) T ] 

Then, the process noise covariance is updated. This covariance is then converted into a 4 x 4 matrix using 
the matrix S, which is then used in the additive filter after the normalization stage. Simulation results from 
Ref. IT show that the adaptive procedure is able to successfully update the covariance matrix for very large 
gyro bias errors. 


B. A Nonlinear Adaptive Approach 

The nonlinear adaptive approach described in this section estimates both the inertia matrix components and 
a constant vector of unknown disturbances . 58 The kinematics are given by the vector of MRPs , 62 denoted 
by p, and Euler’s dynamics equations : 122 


p = 1 {(1 - p T p) hx3 +2[px] + 2p p T } U) = 1b(p)oj 
Ju> 4- [u>x]j7u> — u + F 


(139a) 

(139b) 
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where F is the external vector of disturbances, which is assumed constant. Observations of p and u> are 
assumed with no noise. The adaptive control law is given by 


u = 


\L : 3 : F 


g 

V 

l-ll 


= Qx 


g = [uj\ U>2 ^3 ^1^2 <^2<^3 ViU)zY 


p = —K v (jJ — 


+ 


(t 


4tf„ 


M 


+ 


^3x3 


P -AKiB 


■ X (P) f 

Jo 


p dt 


(140a) 

(140b) 

(140c) 


where J is the estimated inertia matrix, F is the estimated disturbance, K p , K v and Ki are scalar control 
gains, and L = [Li : Z, 2 ] is a matrix of estimated inertia components, with 


0 

J23 

-J23 


Jl3 J33 ~ J22 -Jl2 

—J\z 

0 

Ju 

, l 2 = 

—J 23 J\2 J\\ ~ J 33 


-Jl2 

0 


1 

CO 

CO 

1 

1 


The adaptive update law is given by 

& = -iB T (p)5 3 ex T r 

where T is a 10 x 10 matrix of learning design parameters, and e is given by 


Jo (P Pr) dt 



P-Pr 
P Pr 


(141) 


(142) 


(143) 


where p r and p r are reference trajectories. The matrix S3 is a 3 x 9 sub-matrix of the 9x9 matrix 
S = [Si : S 2 : 53], which is determined by solving the following Lyapunov equation: 

SE + E t S=-D (144) 


where 


E = 


03x3 

03x3 

—Kihxz 


hx3 

03x3 

— K p l 3x3 


03x3 

hx3 

—Kvhxs 


(145) 


Once a matrix D is chosen, then the positive definite matrix S is determined numerically. The stability of 
the adaptive control law is proven using a Lyapunov analysis. 

Clearly, from the definition of L 2 in Eq. (141) the inertia matrix cannot be uniquely determined. Rather, 
the inertia matrix terms and products, using this redundant formulation to make the other parameters 
appear linear, are estimated such that the closed-loop dynamics assumes a prescribed linear form. The 
inertia matrix term adaptation is to enforce this desired linear closed loop dynamics, not to actually identify 
the true inertia terms. Due to the redundancy of the inertia terms, there is an infinity of solutions of these 
inertia terms which will all yield the desired behavior. Simulation results from Ref. 58 indicate that the 
adaptive control law is able to provide robust tracking in the presence of large system uncertainty. 


XII. Conclusions 

Many nonlinear filtering methods have been applied to the problem of spacecraft attitude determination 
in the past 25 years. This paper has provided a survey of the methods that its authors consider to be 
most promising. It remains the case, however, that the extended Kalman filter, especially in the form 
known as the multiplicative extended Kalman filter, remains the method of choice for the great majority of 
applications. It is a relatively simple and flexible tool with extensive heritage that can incorporate a great 
variety of measurements. The extended Kalman filter can fail in cases that have highly nonlinear dynamics or 
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measurement models, or that lack a good a priori estimate of the state. Spacecraft engineers usually employ 
conservative designs with good initial estimates for filters, but increasingly powerful processors promote 
increased spacecraft autonomy, including autonomous initialization of attitude estimation filters. 

Sigma-point filters are an attractive alternative for applications where the nonlinearities of the dynamics 
model or of the measurement models are severe, or when a good a priori estimate of the state is unavailable. 
These filters require the probability density function to be approximately Gaussian, as the Law of Large 
Numbers leads us to expect in all but pathological cases, and to be at worst unimodal. Sigma-point filters are 
especially attractive when it is difficult or impossible to compute analytic partial derivatives of the dynamics 
or measurement models. They also have the advantage of being well suited to parallel computation. The 
backwards-smoothing extended Kalman filter has shown promise in situations similar to those for which 
sigma point filters are indicated. The computational burden of the backwards-smoothing extended Kalman 
filter is such that it is probably not competitive with a well-designed sigma-point filter in most cases, though. 

Particle filters are the only recourse when the density function is significantly non-Gaussian, especially 
if it is multimodal. Particle filters face the curse of dimensionality if more than a few parameters are to 
be estimated, however. Creative ways around this problem coupled with increases in computing power may 
make particle filters more generally useful in the future, but they are confined to niche applications at the 
present. 

Some of the attitude estimation approaches presented in this paper, such as filter QUEST and recursive 
QUEST, are simple linear or pseudo-linear filters but are suboptimal compared to the standard extended 
Kalman filter. These are however useful for spacecraft contingency designs in case of anomalies or as 
simple tools for analysis purposes. The other nonlinear filters discussed in this paper are mostly academic 
exercises, but they may provide the basis for applications as of now unforeseen. The orthogonal attitude 
filter represents the first approach to a truly nonlinear filter, but the theory is still not complete for attitude 
estimation. The predictive filter is useful for deterministic estimation, but its advantages over the other 
filtering approaches have not been shown yet. Nonlinear observers are attractive since they usually are 
proven to be asymptotically stable, but due to their infancy they have not yet found widespread use on 
actual spacecraft yet. Adaptive approaches can be useful when system parameters are not known well or in 
the advent of spacecraft failures. 

Although the new approaches surveyed here have been shown to have some advantages, it is wise to apply 
the old adage “if it ain’t broke don’t fix it” to the standard extended Kalman filter, which has proved its 
worth on a multitude of spacecraft missions. Ultimately, future mission requirements coupled with enhanced 
confidence in the new approaches may bring about their greater use for onboard spacecraft applications. 
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